211 research outputs found
Gradient methods for convex minimization: better rates under weaker conditions
The convergence behavior of gradient methods for minimizing convex
differentiable functions is one of the core questions in convex optimization.
This paper shows that their well-known complexities can be achieved under
conditions weaker than the commonly accepted ones. We relax the common gradient
Lipschitz-continuity condition and strong convexity condition to ones that hold
only over certain line segments. Specifically, we establish complexities
and for the ordinary and
accelerate gradient methods, respectively, assuming that is
Lipschitz continuous with constant over the line segment joining and
for each x\in\dom f. Then we improve them to
and
for function that also
satisfies the secant inequality
for each x\in \dom f and its projection to the minimizer set of .
The secant condition is also shown to be necessary for the geometric decay of
solution error. Not only are the relaxed conditions met by more functions, the
restrictions give smaller and larger than they are without the
restrictions and thus lead to better complexity bounds. We apply these results
to sparse optimization and demonstrate a faster algorithm.Comment: 20 pages, 4 figures, typos are corrected, Theorem 2 is ne
A fast patch-dictionary method for whole image recovery
Various algorithms have been proposed for dictionary learning. Among those
for image processing, many use image patches to form dictionaries. This paper
focuses on whole-image recovery from corrupted linear measurements. We address
the open issue of representing an image by overlapping patches: the overlapping
leads to an excessive number of dictionary coefficients to determine. With very
few exceptions, this issue has limited the applications of image-patch methods
to the local kind of tasks such as denoising, inpainting, cartoon-texture
decomposition, super-resolution, and image deblurring, for which one can
process a few patches at a time. Our focus is global imaging tasks such as
compressive sensing and medical image recovery, where the whole image is
encoded together, making it either impossible or very ineffective to update a
few patches at a time.
Our strategy is to divide the sparse recovery into multiple subproblems, each
of which handles a subset of non-overlapping patches, and then the results of
the subproblems are averaged to yield the final recovery. This simple strategy
is surprisingly effective in terms of both quality and speed. In addition, we
accelerate computation of the learned dictionary by applying a recent block
proximal-gradient method, which not only has a lower per-iteration complexity
but also takes fewer iterations to converge, compared to the current
state-of-the-art. We also establish that our algorithm globally converges to a
stationary point. Numerical results on synthetic data demonstrate that our
algorithm can recover a more faithful dictionary than two state-of-the-art
methods.
Combining our whole-image recovery and dictionary-learning methods, we
numerically simulate image inpainting, compressive sensing recovery, and
deblurring. Our recovery is more faithful than those of a total variation
method and a method based on overlapping patches
Block stochastic gradient iteration for convex and nonconvex optimization
The stochastic gradient (SG) method can minimize an objective function
composed of a large number of differentiable functions, or solve a stochastic
optimization problem, to a moderate accuracy. The block coordinate
descent/update (BCD) method, on the other hand, handles problems with multiple
blocks of variables by updating them one at a time; when the blocks of
variables are easier to update individually than together, BCD has a lower
per-iteration cost. This paper introduces a method that combines the features
of SG and BCD for problems with many components in the objective and with
multiple (blocks of) variables.
Specifically, a block stochastic gradient (BSG) method is proposed for
solving both convex and nonconvex programs. At each iteration, BSG approximates
the gradient of the differentiable part of the objective by randomly sampling a
small set of data or sampling a few functions from the sum term in the
objective, and then, using those samples, it updates all the blocks of
variables in either a deterministic or a randomly shuffled order. Its
convergence for both convex and nonconvex cases are established in different
senses. In the convex case, the proposed method has the same order of
convergence rate as the SG method. In the nonconvex case, its convergence is
established in terms of the expected violation of a first-order optimality
condition. The proposed method was numerically tested on problems including
stochastic least squares and logistic regression, which are convex, as well as
low-rank tensor recovery and bilinear logistic regression, which are nonconvex
- β¦